自适应力矩估计(ADAM)优化器由于其快速收敛属性而广泛用于深度学习任务。但是,亚当的融合仍然不太了解。特别是,对亚当的现有分析不能清楚地证明亚当比SGD的优势。我们将这种理论上的尴尬归因于$ l $ -smooth的条件(即,假设梯度在全球lipschitz连续且常数$ l $)中被文献所采用,而文献经常指出,在实用的神经网络中经常失败。为了解决这一尴尬,我们分析了亚当在轻松的条件下的融合,称为$(l_0,l_1)$平滑度条件,这使梯度Lipschitz常数可以随地梯度规范而变化。 $(l_0,l_1)$严格弱于$ l $ -Smooth条件,并且已经过经验证明可以保留实用的深神经网络。在$(L_0,L_1)$平滑度条件下,我们为Adam建立了与实用的超参数的收敛性。具体而言,我们认为亚当可以适应局部平滑度条件,证明亚当的\ emph {Adpativity}是合理的。相反,在这种情况下,SGD可以任意放慢。我们的结果可能会阐明自适应梯度方法比非自适应方法的好处。
translated by 谷歌翻译
最近,对分布(OOD)数据具有相关性转移的概括引起了极大的关注。相关转移是由与类标签相关的虚假属性引起的,因为它们之间的相关性可能在训练和测试数据中有所不同。对于这样一个问题,我们表明,鉴于类标签,有条件独立的虚假属性模型是可推广的。基于此,提出了控制OOD泛化误差的度量条件伪变异(CSV),以衡量这种条件独立性。为了改善OOD的概括,我们将培训过程正常使用拟议的CSV。在温和的假设下,我们的训练目标可以作为非Convex-Concave Mini-Max问题提出。提出了具有可证明的收敛速率的算法来解决该问题。广泛的经验结果验证了我们算法在改善OOD概括方面的功效。
translated by 谷歌翻译
Navier-Stokes方程是描述液体和空气等流体运动的重要部分微分方程。由于Navier-Stokes方程的重要性,有效的数值方案的发展对科学和工程师都很重要。最近,随着AI技术的开发,已经设计了几种方法来整合深层神经网络,以模拟和推断不可压缩的Navier-Stokes方程所控制的流体动力学,这些方程可以以无网状和可不同的方式加速模拟或推断过程。在本文中,我们指出,现有的深入Navier-Stokes知情方法的能力仅限于处理非平滑或分数方程,这在现实中是两种关键情况。为此,我们提出了\ emph {深入的随机涡流方法}(drvm),该方法将神经网络与随机涡流动力学系统相结合,等效于Navier-Stokes方程。具体而言,随机涡流动力学激发了用于训练神经网络的基于蒙特卡洛的损失函数,从而避免通过自动差异计算衍生物。因此,DRVM不仅可以有效地求解涉及粗糙路径,非差异初始条件和分数运算符的Navier-Stokes方程,而且还继承了基于深度学习的求解器的无网格和可区分优势。我们对凯奇问题,参数求解器学习以及2-D和3-D不可压缩的Navier-Stokes方程的逆问题进行实验。所提出的方法为Navier-Stokes方程的仿真和推断提供了准确的结果。特别是对于包括奇异初始条件的情况,DRVM明显胜过现有的PINN方法。
translated by 谷歌翻译
随机部分微分方程(SPDE)是在包括大气科学和物理学在内的许多领域建模动力学的重要工具。神经操作员,几代神经网络具有无限维空间之间学习图的能力,是解决参数PDE的强大工具。但是,他们缺乏建模SPDE的能力,而SPDE通常由于驾驶噪声而定期较差。由于规律性结构的理论在分析SPDE方面取得了巨大成功,并提供了概念模型的特征向量,使SPDES的解决方案良好,我们提出了具有规律性结构(NORS)的神经操作员,该神经操作员结合了用于建模由SPDES驱动的动力学的功能向量。我们对各种SPDE进行实验,包括动态PHI41模型和2D随机Navier-Stokes方程,结果表明NORS是分辨率不变的,有效的,并且在较小量的数据级较低的误差中降低了一个数量级误差。
translated by 谷歌翻译
差分方程管理的学习动态对于预测和控制科学和工程系统来说至关重要。神经常规方程(节点)是一种与微分方程集成的深度学习模型,最近是由于其对不规则样本的鲁棒性及其对高维输入的灵活性而流行的学习动态。然而,节点的训练对数值求解器的精度敏感,这使得节点的收敛不稳定,特别是对于不稳定的动态系统。在本文中,为了减少对数值求解器的依赖,我们建议提高节点训练中的监督信号。具体地,我们预先训练神经差分运算符(NDO)以输出衍生物的估计用作额外的监督信号。 NDO在一类基础函数上预先培训,并将这些功能的轨迹样本之间的映射学习到其衍生物。为了利用来自NDO的轨迹信号和估计的衍生工具,我们提出了一种称为NDO-Node的算法,其中损耗函数包含两个术语:真正轨迹样本的适应性以及由输出的估计衍生物的适应度预先训练的NDO。各种动力学的实验表明,我们提出的NDO-Node可以一致地用一个预先训练的NDO来改善预测精度。特别是对于僵硬的杂散,我们观察到与其他正则化方法相比,NDO-Node可以更准确地捕获动态的过渡。
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.
translated by 谷歌翻译
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译